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1. Introduction 

Discrete stochastic models in continuous time are the simplest way to turn survival 
analysis of an ecological system into a simulation of that system. They build models most 
faithful to a picture of individuals making choices which affect each other. This article 
describes how rules for individual behavior can construct a simulation of a population. It 
argues that, within the context of discrete stochastic simulation, hazard rates for transitions 
are intrinsic descriptions of individual behavior^] 

Discrete stochastic models in continuous time are associated with the Gillespie algorithm 
and its variants, including the Next Reaction algorithm. The Gillespie algorithm samples to 
find the next state and time of a stochastic process. The process and sampling algorithm are 
separate. This article won’t define a new sampling algorithm but will examine the stochastic 
process that is sampled. 

Almost every discrete stochastic model in continuous time used for ecology is based 
upon a chemical kinetics model. These chemical kinetics models take the form of chemical 
species which interact according to stoichiometry at rates specified by propensities. They 
have been enormously successful for ecological applications such as stochastic movement 
models p :2|[3p3] and any model that builds group behavior from individual behavior [5j|6j[7]. 
Clear statement of chemical kinetics as a process leads to clear specification of chemical 
kinetics models as matrices and vectors in Systems Biology Markup Language and computer 
code which, in turn, leads to their utility to solve common problems. 

The chemical kinetics model constrains expression of ecological problems in two ways. 
The first is that every state in a chemical kinetics model is a count of chemical species when 
an ecologist might want the state of an individual to carry properties that record its life 
history. The second is that propensities are usually simple rates when an ecologist might 
want rates to be time-dependent, so that an insect gets more hungry over time, or a cow 
waits to reproduce again, or a frog jumps quickly if it jumps early. These two important 
modifications are that individuals hold state and their hazard rates for change are time- 
dependent. 


* Corresponding author 

1 Abbreviations used in this article: MRP for Markov renewal process, LLCP for long-lived competing 
process, GSPN for generalized semi-Markov Petri net, and GSMP for generalized semi-Markov process. 
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Two particular areas are ripe for use of non-constant hazard rates and hazard rates more 
specific to individual histories. The first is stochastic movement models, which already have 
a strong history [U P El i]. The second is infectious processes. Recent studies show that 
while constant hazards for infection model well the most intense parts of outbreaks any 
analysis of early outbreaks and the distribution of early outbreaks will find incorrect Rq if 
they do not include Gamma-distributed or Weibull-distributed recovery timesjHIIS]. 

The statistical process defined in this article includes the ability for individuals to have 
state associated with them and the ability for hazard rates to be time-dependent, and it is 
called long-lived competing processes (llcp). While it is known that the Gillespie algorithm 
can sample such an llcp process, the process, itself, has not been specified. Lacking a 
specification, previous uses of time-dependent hazards and property-carrying state have 
misunderstood both how models relate with observation and how to sample the trajectory 
from such a model. The bovine viral diarrhea models of Viet et al. [10] demonstrate how time- 
dependent hazards can include farm management decisions within a continuous-time model, 
but they also mistakenly parameterize transitions in the model from holding times instead 
of hazard rates, both of which are explained in the results section below. The wasp model 
of Ewing et al. m calculates competition among behavioral drives, including temperature 
dependence, using a detailed continuous-time model with time-dependent hazards, but the 
sampling method used is statistically biased. In both cases, the resulting trajectories are 
biased, so that summary data, such as average survival, are incorrect. Correctness here is a 
question of whether observation times and theoretical rates put into a model are the same 
that come out of simulation of the model. Correctness is a question of whether the statistical 
machinery fails the effort required to gather data and parameterize a model. 

The other consequence of lacking a specification is that there are no ready tools with 
which to construct a discrete stochastic process in continuous time which has time-dependent 
hazard rates. A mathematical specification is the Erst step to understanding how to state a 
process clearly in a hie or in code, such that it can be sampled. It was the goal of the authors 
to make construction of ecological and epidemiological models trustworthy and rapid. 

The central tenet of the stochastic processes defined in this article is that individuals 
compete. Not only do individuals compete, but also for a single individual, the drive to eat, 
the drive to reproduce, and the drive to move compete. For these models, an individual is 
the sum of its drives. Each of those drives is modeled by a separate stochastic process whose 
hazard rate conies from survival analysis. The long-lived competing process is built from a 
set of conditionally-independent stochastic processes, conditional on interruption by other 
processes. Each process has its own time-dependent hazard to fire. 

To understand the aptness of a competing process formulation take a classic compartmen- 
tal diagram, such as the vector and host model in Fig. [lj shows which state transitions are 
possible for an individual. This diagram, given rates for transitions, could guide construc¬ 
tion of a differential equation model for a population, but think here modeling individual 
behaviors. The diagram in Fig. [2] represents rules for how an individual can change state 
from one compartment to another and at what rate the state can change. It emphasizes 
possible transitions from which move those individuals from one compartmental state to the 
next. Sec. |3.3| shows how to make this diagram into a specification for a model in the same 
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Figure 1: A simple compartmental model showing a vector infecting a host. 



Figure 2: This diagram emphasizes transitions between states because each transition is associated with a 
stochastic process. Host states label susceptible, infected, and immune. Vector state label susceptible and 
infected. 

way that a stoichiometric matrix is a specification for a chemical model. 

The methods section of this article and appendix construct a representation of a statis¬ 
tical process in order to show generality and completeness. The results section covers how 
to construct and parameterize this process. 

2. Previous Work 

This article defines a class of stochastic processes without discussing the algorithms with 
which to sample trajectories of those processes. These algorithms are commonly known as 
Gillespie-type and encompass the following: Direct, Direct non-Markovian, First Reaction, 
Next Reaction, Anderson’s Next Reaction [I2l T3j, ITT , To] . This article doesn’t extend these 
algorithms but rather expands the possible ways to define changes of state, given Gillespie- 
type dynamics in time. 

While the exposition within this article relies on calculus of stochastic variables, a more 
modern approach to discussion of Markov renewal processes uses martingales and random 
time changes, as described by Anderson and Kurtz [16]. Restricting mathematical develop¬ 
ment to calculus is more approachable but will falter to answer more precise questions about 
what it means for a competing process to be well-defined. 

There is a mathematical identity that any discrete, stochastic, continuous-time process 
can be rewritten as a set of competing processes, one for each possible next state and 
defined anew at each time step[T7[ [18]. The long-lived competing processes described below 
are an extension of competing processes to encompass biological behaviors which aren’t 
reset each time anything in the system changes, as happens for competing processes. It can 
be considered a subset of an older technique called the Generalized Semi-Markov Process 
(gsmp), used in engineering [19j. The main criticisms of GSMP simulations are that they 
can be complicated to specify and slow to simulate. Restriction of the long-lived competing 
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processes to conform to Markov renewal processes reduces complexity somewhat and greatly 
speeds computation by making possible the use of Next Reaction method. 

The Generalized Stochastic Petri Net (gspn) in Sec. 3.3 is not only a data structure but 
also a set of engineering techniques for building a process using that data structure[201 121] . 
Those rules are more amenable to human-mediated processes and are a superset of what is 
described here. This article focuses on simulation as enactment of estimations from survival 
analysis, so it skips over much of the extensive feature set, and complication, associated 
with use of GSPN in manufacturing and reliability analysis. 

There have been major frameworks designed and used for general individual-based mod¬ 
eling in ecology. The DEVS discrete-event simulation is a general-purpose simulation tool 
adopted for ecology[22]. Later groups made tools more focused on ecological simulation 
in order to reduce the burden of programming and design. OSIRIS and WESP are good 
examples[231 [23]. Both focused on ease-of-use and correctness by combining higher-level 
abstraction with software architecture. 

In contrast to these frameworks, the work presented here is a simpler mathematical 
model. There is a sample implementation as a C++ library, and references to data structures 
and algorithms are sufficient to implement the model[23]. The emphasis, however, is on 
how to join statistical estimation of a system with its simulation because understanding 
causal dependence of a system is more complicated than programming, and quantitative 
understanding begins with statistical correctness. 


3. Methods 

3.1. Markov Renewal Process 

To call a simulation a discrete stochastic process in continuous time, means that it is 
a representation of a Markov renewal process (mrp). Take the MRP as the definition of 
what it means to be such a simulation. The structure of an MRP determines in what way 
a model approximates the real world. It also circumscribes limitations on what choices 
in a simulation conform to mathematical requirements. The MRP is well-known and this 
exposition follows Cinlar[26] in order to explain how all of the actions of every individual in 
a simulation, taken together, form one single MRP from which to sample trajectories. 

For an MRP, time is a random variable, so it jumps from an initial time, T 0 , to the next 
time, Ti > T 0 . For any jump, from T n to T n+ 1, the state of the system is defined at those 
times, not in-between. The times themselves form a countable set. The state of the system, 
X n , is, itself, a random variable on a discrete space, so it is labeled X n = i. 

The central engine of an MRP is that the next state and the next time are chosen according 
to a joint probability for the next state to be state X n+ i = j at time T n+1 given current 
state X n = i and time T n , 


P [X n+1 = j, T n+ 1 - T n < t\X n = i]. (1) 

The density of this probability, its derivative, is called the semi-Markov kernel, denoted 
qij(T n+ 1 — T n ). A process which conforms to an MRP specifies all the next possible states of 
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Figure 3: A Markov renewal process defines a probability distribution of a next state at a next time given 
the current state. Each circle represents a different state of the system. Each arrow is a transition to a next 
state and time. From now, what is next? 



Figure 4: Given a system with two individuals, A and B, which can move among three metapopulations, 
there are nine physical states of the system, as shown. Because the mrp can depend on times in a state, 
the complete state of the system, X n , can include the times at which A and B arrived in their current 
configuration. 


the system, and, given the current state, the joint probability of each next state and time 
for that state. 

An MRP is a view of an ecological simulation, zoomed out so that anything that happens, 
in any part of the simulation, is the next change to the state of the whole simulation, as 
shown by the states in Fig. [4j Saying that a complex simulation is an instance of an MRP 
is a statement that all of the biological and physical processes within a simulation become 
a single statistical process, marching forward in time, according to a single joint probability 
distribution. Identification of that joint probability distribution is what permits uniform 
sampling in a simulation. 

There is no restriction for an MRP that the probability distribution for the next states be 
exponential. For every Markov random process, the next state depends only on the system 
state X n at T n . For a Markov random process whose distribution of next stopping times 
is always exponential, the probability of arrival at a particular next state not only doesn’t 
depend on states before X n but also doesn’t depend on the amount of time the system takes 
to arrive at that next state. 
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Figure 5: Long-lived competing processes define individual actions which, together, create a Markov renewal 
process. Only one long-lived process fires at a time. Others may be enabled or disabled when one fires, but 
at no other time. There isn’t a one-to-one association between a competing process and the state of the 
system. Each process modifies the joint substates of the system in some known way to cause a change in 
the current state. 

The exposition here treats the state space, X n , as a finite set of states, which would be 
restrictive in practice. The number of states can be infinite as long as the next possible 
states, X n+ i = j can be sampled statistically. Qinlar’s careful presentation allows states 
to be real numbers, explaining that, given a discrete set of stopping times, the set of real 
numbers chosen to be stopping times is itself a finite set. 

A process which conforms to an MRP must define a set of initial states, a set of next 
states given any current state, and a joint probability for the next states and times at which 
they might occur. The next section constructs such a process, starting with intuition for 
competing processes. 

3.2. Long-lived Competing Processes 

The guiding idea for constructing this specific MRP is that individuals compete for re¬ 
sources and modify the environment. There will be multiple simultaneous stochastic pro¬ 
cesses, each of which represents a tendency towards changing the individual or the environ¬ 
ment. Definition of long-lived competing processes (llcp) proceeds in two steps, defining 
the state of the overall system, and showing how to calculate the probability of the next 
state and time, Eq. [lj from current state. 

The state of a system with many individuals, such as the metapopulation in Fig. [6j is 
exponentially large (the number of trajectories given an initial value will be combinatorially 
large). Any time any one of the individuals moves, sthe whole system is in a new state, 
X n+ \ = j. The total number of states is the number of arrangements of individuals, and 
times at which they could arrive at those arrangements. A goal of this the LLCP specification 
is to ensure that the behavior of any particular individual can depend on only a local 
environment. Whether its local environment is its location or the its neighboring individuals, 
that individual’s behavior will be independent of any behavior that doesn’t interfere with 
its local environment. 

The physical state of the system is the state of individuals and their environment, separate 
from times at which the system entered this physical state. This definition follows the 
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Figure 6: The same metapopulation simulation of A and B moving among three locations could be modeled 
with three physical substates, one for each metapopulation, and six processes which compete to move an 
individual to neighboring metapopulations. The competing processes are enabled only when a substate has 
individuals to move. 
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Figure 7: Long-lived competing processes are defined in absolute time since the start of the simulation 
because they represent what is allowed within a model. When a long-lived competing process is enabled, it 
has nonzero hazard. When it fires or becomes disabled, the hazard returns to zero. 


language of the gsmp in Glynn[19jJ. It is specified as a set of substates, so, si, ... s p , so that 
those substates can represent the local environment of an individual. This could be the 
location of each individual or nearby food, for example. The substates are a disjoint set 
which, together, describe the whole physical state of the MRP. 

Each possible cause for change in the system is represented by a separate stochastic 
variable, L m . For instance, “individual two jumps down” would be a single cause for change. 
At the moment the individual arrives in a spot, the cause to jump away from that spot is 
called enabled at time T e m . That cause may fire, or a competing cause may disable it. 
For instance, the firing of “individual two jumps right” would prevent that individual from 
having jumped North. 

The probability for a cause to fire at any time depends on some subset, {s*}, of the 
substates of the physical system and can always be written as a cumulative distribution 
function defined by a time-dependent hazard rate, X m (t), 


p [Lm < T m |{si}] = 1 - exp 



X m (yS^ds 


( 2 ) 


This is the probability for the time at which a particular cause will happen, were it to 
happen. It is defined, not since the last transition, but in absolute time, since the start of a 
simulation. Associated with this cause is a rule for when it is enabled, when it is disabled, 
and what it does to the substates when it fires. Those three rules are functions of the 
substates associated with this cause and no other physical substates. 

A simulation defines many causes simultaneously. From all of these causes, the first to 
fire determines what happens next in the system. The next time, T n+ 1 , for the whole system 
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as an MRP is the minimum of the set of stochastic variables, 


T n+ 1 = min ({ L m }), (3) 

taking into account that none of the enabled causes fired before T n . It’s a competition 
to be the first to fire, determined by the cause-specific probabilities for bring. When one 
bres, it may disable or enable other causes. The cause that bres determines how the system 
changes to a new state, X n+ \ = j. Multiple causes may independently ebect the same new 
state, in which case they, together, dehne the all-causes probability of that state. The joint 
probability for X n+ i and T n+ 1 is given in appendix A. 

No cause that is competing, in the sense that the bring of another cause could disable 
it, will ever bre with the probability distribution given in Eq. [2] However, the long-lived 
competing process guarantees that the hazard rate of a cause, as determined by standard 
survival analysis, will match the hazard rate of Eq. [2j because the hazard rate is the rate at 
which a cause will bre, were it to bre. To debne an llcp, 

• Debne physical substates, s 0 , Si,... s p , 

• For each cause, debne 

— An enabling function on the substates which determines enabling and, if enabled, 
returns the stochastic variable, L m , and 

— A bring function, which modibes substates. 

• Substates and causes form an undirected, bipartite graph. 

Haas covers the possibility of incomplete distributions for a similar process, the GSMPplj. 
If there is some chance that the biological process associated with a cause might never 
happen, even in the absence of competition, then the probability in Eq. [2] will be incomplete, 
meaning it will not sum to one. Only if there is some probability that no cause in the system 
bres, so that the simulation stops entirely, will the probability of Eq. [I] be incomplete. Most 
often it will sum to one or be zero entirely when the system reaches what is called an 
absorbing state. In an ecological simulation, an absorbing state could mean all individuals 
are recovered from disease, or it could be a delicate way to say every individual has died. 

A consequence of this construction of an LLCP is that the enabling times, T^, for the 
long-lived processes become part of the state of the system. For simple Markov processes, 
as opposed to mrps, those enabling times don’t abect the choice of the next state but, in 
general, they are important for non-exponential distributions. While the times are real¬ 
valued, the state of the system is discrete because those times are drawn from the set of 
times, T n , at which the system is dehned. 

Within an llcp, no two long-lived processes may bre simultaneously. For instance, no 
two causes could both have distributions in time which bre exactly at 5 pm. It is possible 
to debne a more elaborate process which accounts for such conhict, but it would not as 
clearly obey hazard rates from survival analysis and would require a sampling mechanism 
more complicated than Gillespie-type. 
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Figure 8: The language of the GSPN is that a transition moves tokens among places. The cardinality of the 
moving tokens is stoichiometry. Each token may have properties, which means it may contain state within 
the token. This graph, accompanied by a list of properties of the token and transition, is a specification of 
a simulation. 

It is possible to define an infinite number of physical states and even an infinite number 
of possible transitions (for instance, an ant wandering on sand), as long as the number of 
enabled transitions is finite after each time step. The restriction is that it must be possible 
to sample the central probability, Eq. [l] Computationally, this would require storing only 
that part of the state that is currently relevant to the simulation. 

The long-lived competing process, as a pure expression of simulation by survival analysis, 
can be seen as a stripped-down version of the Generalized Semi-Markov Process (gsmp). 
The main historical criticism of GSMP is that such arbitrary definition of what is a substate 
and when causes are enabled or disabled produces a slow simulation. However, solution for 
the next time step, Eq. [3j is exactly what Gillespie’s algorithm solves, and the tremendous 
speedup from Gibson and Brack’s Next Reaction Method relies on recording which causes 
depend on which physical states. Those two details are what make an llcp more computable 
than an arbitrary continuous-time stochastic system. 

3.3. Generalized Stochastic Petri Net 

A Generalized Stochastic Petri Net (gspn) is a family of stochastic processes and asso¬ 
ciated data structures. This section defines a long-lived competing process using the data 
structure and associated nomenclature of GSPN. Being specific about the data structure 
creates a clear relationship between the firing of a process and how the state of the system 
as a whole changes. 

The state of a system, excluding its transitions, is defined by tokens at places , much like 
the location of checkers on a board defines the state of a game. Each place has a unique 
identifier. The tokens at a place are a physical substate, s*, of the llcp. While chemical 
kinetics simulations usually describe only a count of molecules of a species, individuals in 
an ecological simulation will often have more life history, such as birth times or subspecies, 
which can be represented as properties of a token. 

Transitions move and modify tokens, which is called firing a transition to produce an 
event. The enabling rule and firing rule all obey stoichiometry, which is a count of how many 
tokens are taken from a place or put to a place when a transition fires. This isn’t required 
for the LLCP but has been shown by Haas to produce an equivalent statistical process [213 • If 
tokens do not have properties, then stoichiometry can be sufficient to describe both enabling 
and firing rules. A transition is enabled when the number of tokens at its inputs meets or 


9 





Figure 9: A movement model for a single individual moving among three habitats might have six transitions. 
Coefficients on edges indicate the number of tokens a transition needs to be enabled and where a transition 
moves tokens when it fires, much like stoichiometry in chemical systems. 



Figure 10: The stoichiometry of a GSPN requires movement of tokens. Shown here are two different ways 
to implement a metapopulation movement model which incorporates movement dependence on population 
size. The GSPN on the left defines a separate transition for each configuration of occupancies. The transition 
“move2” defines a hazard for moving individual A when individual B is present. The GSPN on the right 
creates a separate place whose token count represents the number of individuals in the metapopulation. 
This extra state avoids combinatoric explosion of the number of transitions. 


exceeds its required inputs. It is disabled when this is no longer true. A transition, upon 
firing, moves the requisite number of tokens from inputs to outputs. 

When tokens have properties, as they often will for ecological simulation, then the spec¬ 
ification for each transition may be augmented with a rule such as, there must be at least 
one input token with the property that the individual is older than three weeks. Hazard 
rates for firing may be parameterized on properties of any tokens at transition inputs. 

The state of the system is not only the tokens, with their properties and locations, but 
also the set of enabling times for transitions. Because this model permits non-constant 
hazard rates, the enabling times are necessary to know the distribution of future firing times 
for the system of transitions. 

To define a GSPN, 

• Define a set of places. 

• Define tokens which can move among those places and may have properties. 

• Each transition has 

— An enabling function which returns whether the transition is enabled on its places 
and returns the stochastic variable for the firing time of this transition. 

— A stoichiometric coefficient specifying how many tokens are taken from or sent 
to each place upon firing. 
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— A firing function which moves and modifies properties of tokens. 

• Places and transitions form a bipartite graph which is directed by stoichiometry. 

The GSPN is a network in that transitions connect to places which connect to other 
transitions. For most simulations, this network can be created as a graph structure with 
marking at the places and transition enabling times at the transitions. The data on this 
graph is then the state of the model. How a simulation chooses what to fire next is separate 
from the state of the model, and any one of the Gillespie algorithms would yield the same 
ensemble of results. 

4. Results 

This section discusses rational construction of such a stochastic process from real-world 
observations or from mental models of ecological processes. Consistency with the outside 
world can be subtle anywhere in a model that two individuals might make conflicting deci¬ 
sions, which is the point of greatest interest. 

4.1. Choice of States 

An early step in building a model is to decide what can be the states of the system. 
The granularity of a model is a measure of how many behaviors are grouped together. For 
instance, an individual-based model of movement might include each wandering ungulate, 
whereas a more coarse-grained model could represent a single state for the location of a 
whole herd. 

The term “compartmental model” comes from the task of distinguishing types of tissue 
in microscope images. The task is assignment of a continuous observable to a discrete class. 
A state in a GSPN can be tied to a differentiating observation or to a theoretically relevant 
tipping point, such as survival of a metapopulation past early stochastic die-off. Something 
as simple as a state that represents a location still involves choice. For instance, a movement 
model of butterflies going bush to bush might identify a state according to the nearest bush, 
but it might identify a state as the last bush on which the butterfly landed, even if the 
butterfly has since taken off from that bush. The latter choice would correspond to field 
observations which record times at which insects land. 

The second step is to choose how tokens on places represent those states. The combi¬ 
nation of token and place together define the state. A movement model could have a place 
for each location, or it could store the current place as a property of an individual’s token. 
Both choices are valid, but a GSPN which has fewer transitions connecting to the same place 
tends to be more efficient for simulation. Two transitions which depend on the same place 
equate to two biological processes which depend on the same local environment. They are 
coupled, and that coupling means a simulation has to do more work to ensure the currency 
of each process when shared state changes. 

However states are chosen, the choice of which states to define combines with the eco¬ 
logical reality of which transitions among states are possible to create a necessary set of 
transitions to parameterize. 
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4-2. Holding Times and Waiting Times 


The behaviors of an individual become a set of related transitions in a GSPN. Consider 
an example of an individual which arrives at a metapopulation and can move to one of four 
other metapopulations. There are four transitions, one for each movement, and all have the 
same enabling time, the moment the individual arrived at its starting location. A token 
starts in one place, and each transition, were it to fire, would move it to a different place. 
Together, four transitions define the movement behavior of a single individual. 

There are two questions to answer about such a system: what happens next and when 
it happens. These together are the joint probability of Eq. [I] which is Pfwhat, when]. As 
with any joint probability, there are two factorizations. These factorizations are called the 
holding time and the waiting time. 

The holding time formulation asks, given what happens next, when will it happen? This 
factorizes the joint probability as P [what] P [when|what]. Two quantities together specify 
the density of this distribution, the time-independent stochastic probability density, n\j, 
of one outcome or the other, and the time-dependence of when that outcome happens, 
h n j (T — T e ), given that it will happen. This latter quantity is called the holding time. Time 
for a distribution is measured as time since enabling, which is time since the token arrived 
at the current place. 

The waiting time formulation asks, given when the next event happens, what will hap¬ 
pen? This factorizes the joint probability as P [when] P [what|when]. The density of times for 
the next event are called the waiting time, Wi(T — T e ). This information, were it observed 
by itself in an experiment, would be insufficient to determine hazard rates for transitions. 
It must be augmented by the time-dependent stochastic probability, nij(T — T e ), which asks 
the probability of one outcome or another at the given firing time. 

Neither the holding time nor the waiting time are what go into the GSPN. Given mea¬ 
surement or specification of holding times or waiting times, survival analysis reduces them 
to hazard rates, A ij(T — T e ), which are the probability per unit time that an event happens, 
given that it has not yet happened. It is this hazard rate that specifies the distribution 
of a transition in Eq. [2] For the simple example of a token moving to one of some num¬ 
ber of places, the hazard rate can be found from the semi-Markov kernel written either as 
Qij(T ~ T e ) = 'KijhijiT - T e ) or as q^iT - T e ) = w^T - T e )it i:j {T - T e ). From that kernel, 


the hazard rate is 


Ay(r-r e ) 


QijjT — T e ) 

1 - E j Si Tj(s)ds 


(4) 


This simplified analysis of an individual’s choices in the presence of an environment provides 
states and hazards with which to construct a system of many individuals sharing an envi¬ 
ronment. For observed data, a statistical estimator, such as Nelson-Aalen or Kaplan-Meier, 
construct hazards and integrated hazards which properly account for the kind of censoring 
that looks, in a GSPN, like one transition disabling another transition[27]. 

The hazard rate for every transition can be stated equivalently by a transition distribu¬ 
tion, f(t) = X(t) exp(— / X(s)ds). The holding time, waiting time, and transition distribu¬ 
tion are three different probability densities in time. Mistaking holding time distributions 
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for transition distributions will construct a GSPN whose timing isn’t what was measured or 
estimated. If and only if a transition can never be disabled by another transition, so that it 
is independent of all other transitions, will this probability distribution be the same as the 
holding time which will, in turn, be the same as the waiting time. In all interesting cases, 
they differ. 

Construction of a stochastic, continuous-time model can be a balancing act because 
observed rates for transitions, the holding or waiting times, are the result of competition 
among individuals and other ecological processes, whereas simulation is controlled by hazard 
rates. Consequently, a simulation constructed from seemingly reasonable choices of hazard 
rate may show transition rates above or below expectations because of dependency among 
transitions as they compete. If some transition rates are estimated or inferred, they may 
be reparameterized. The common technique of establishing a parameter which is a scaling 
factor on the transition distribution may not be desirable because it scales hazard rate in a 
time-dependent way and because it changes how incomplete the transition is. Incompleteness 
implies a transition would never fire, even in the absence of competition, which may not be 
an intended consequence of rescaling. 

4-3. Markovian Systems 

The chemical kinetic models that are most commonly used for ecology and epidemiology 
are Markovian, which implies that every transition’s distribution is exponential. Exponential 
distributions have the unique property that the hazard rate is constant from the enabling 
time onward, so that once a transition is enabled, it is no longer important to remember 
when it was enabled. A GSPN model whose transitions are all exponentially-distributed 
therefore need not include enabling times in its state. 

Consider two individuals in the same location of a metapopulation model. One GSPN 
representation represents their presence in a location by two tokens on two places. Transi¬ 
tions can move those tokens to the other locations. If both have identical transition rates 
for moving to other locations, then an alternative representation in a GSPN is to place both 
tokens at the same place. The transition’s hazard rate is then the sum of the hazard rates of 
each token, but, at the moment the transition fires, it picks only one token to move. When 
all hazard rates are constant, this technique of coalescing multiple places and transitions 
into one place and one transition can greatly speed computation. It also works, however, 
for time-dependent hazard rates, as long as all coalesced hazard rates have the same time 
dependence, with the same enabling time. 

While an llcp on a GSPN and a Markovian chemical kinetic model both use exponential 
distributions, the LLCP permits more general rules for when transitions are enabled and how 
state changes when it fires. 

4-4- Model as a Graph 

No particular representation of a model as places, tokens, and transitions is unique. There 
can always be another model with different places, tokens or transitions whose trajectory is 
one-to-one equivalent. For instance, we might represent a single insect as a token hopping 
among places, or we might represent it as a token with a property which is its current 
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location. Nevertheless, the form of these models, as graphs whose nodes can be enumerated 
and cataloged, offers a unique chance to construct models incrementally and compare results. 

Given a particular gspn representation of a model, every aspect of the model is cataloged 
as a place, transition, or marking. When comparing another, similar model, some of the 
places, transitions and markings will be the same, and some will be different. It is possible 
to make a list which can be verified computationally. This is a significant improvement over 
more free-form code. 

Composition of models becomes composition of graphs. This might be a hierarchical 
composition, as when individuals with disease states are composed by a contact graph, 
or it could be a composition of an ecological model with an economics model. In either 
case, modular composition of Petri Nets by transition fusion or place fusion is already 
described |281 [29j . Graph fusion techniques offer a formal method for joining GSPN. 

The GSPN graph itself encodes causal dependency in a simulation. Connections between 
transitions and places form a bipartite graph. Enforcement of stoichiometry means that 
every transition takes tokens from input places and gives tokens to output places, so that 
edges of this graph are directed. Any transition whose bring would disable another transi¬ 
tion makes that other transition dependent. Stoichiometry records when removing a token 
disables a transition, so it encodes this statistical dependency. The converse statement is 
that transitions whose parent places are not shared cannot be dependent. 

The most immediate advantage of a clear dependence structure is computational effi¬ 
ciency. Because the directed bipartite graph limits possible causes and effects of any bring 
transition, the same graph enables efficient implementation of Gibson and Brack’s Next 
Reaction algorithm or Anderson’s method, which is an equivalent algorithm based on haz¬ 
ard rates. Both of these optimizations of Gillespie’s First Reaction method augment the 
stochastic process described here with added state describing a remaining integrated hazard 
for sampling each transition. Maintenance of this state imposes a requirement carefully 
ensured for long-lived competing processes, that any transition whose hazard rate or en¬ 
abling changes when another transition bres must be dependent on that transition. For 
the GSPN, this means it must stoichiometrically share an input place with the transition’s 
output places. An alternative implementation could separate this dependency list into its 
own directed graph among transitions. 

5. Conclusions 

Chemical kinetics simulations with exponentially-distributed transitions are well under¬ 
stood and commonly in use for ecology. The addition of time-dependent hazards and more 
general bring rules, as described here, introduces signibcantly more complexity for theory 
and implementation, but it allows design of simulations which more accurately obey both 
measurement and intuition. Individual behavior is neither measured nor imagined as hap¬ 
pening at constant rates, so the ability to model with time-dependent hazard rates avoids 
the more complicated question of what behavior a simulation represents. 

By offering a dehnition of an underlying stochastic process, this article sets ground 
rules so that it becomes possible to focus on ecological questions about a simulation. One 
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complaint about individual-based models is that the only specification for complex models is 
code [Hm 3T], and specification of an llcp is entirely separate from the code used to sample 
the process. Since hazard rates are central to simulation of an LLCP, survival analysis for 
complex systems becomes crucial. Techniques borrowed from survival analysis, such as 
proportional hazards and frailty models, provide known tools for estimation of parameters 
for a simulation. 

As noted in Grimm et al, a simulation requires a hierarchy of choices[7]. The stochastic 
process supports choice of who or what is included in the simulation and, above that, the 
behaviors under study. These, in turn, are mirrored in layers of software. The authors 
provide a sample implementation of llcp as a C++ library[25]. This, in turn, supports 
construction of an application which expresses an ecological model. The LLCP is a statistical 
process upon which to build modeling tools more focused on specific tasks, such as animal 
movement, epidemiology, and metapopulation analysis. 

The authors thank David J. Schneider and Christopher R. Myers for helpful discussions. 
This work was supported by the Science & Technology Directorate, Department of Homeland 
Security, via interagency Agreement No. HSHQDC-10-X- 00138. 


Appendix A. Derivation of Long-lived Competing Processes 


In order to affix notation, we define the continuous probability distribution of a non¬ 
negative, real-valued random variable A", denoted Fx(T), as Fx(T ) = V[X < T], Given 
that Fx(T is sufficiently smooth, its density is the derivative, fx(T) = dFx(T)/dT. The 
survival is the complement of the probability distribution, Gx(T) = 1 — F X (T). Lastly, the 
hazard rate is the ratio of density to survival, A x(T) = f x (T)/Gx(T). The hazard rate 
specifies the density 


fx(T) = A x (T) exp 



A x(s)ds 


(A.l) 


for differentiable probability distributions. Subject to the caveat of differentiability, there is 
no loss of generality in describing stochastic processes in terms of hazards. 

A long-lived competing process defines a physical state Si, composed of substates, s; = 
( Po,pi ,...). There are stochastic variables, L k , defined in absolute time, T, since the start 
of the simulation. The system state, J n , consists of both the physical state, s,, and the 
enabling times of each of the L k . The stochastic variables, their enabling rules, and their 
firing rules, form an operator on the state of the system. This operator, distinct from an 
initial state, is the model for all of what can happen in a simulation. 

At the moment of enabling a competing process, the survival distribution has the ex¬ 
pected form, GL k (T ) = V[L k > T\. Once any process has fired, those L k which remain 
enabled are now known not to have fired, so their probability distributions are rescaled. 
Defining a long-lived competing process by its hazard rate, the rescaled survival distribution 
integrates over future times, 


G Lk (T,T 0 ) = exp 



A Lk (s)ds , 


(A.2) 
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Each of these L k has a density 
T ' '-' jJ The 


where T 0 is the current time of the last observed event. 

defined in terms of its hazard in absolute time, fL k {T) = Ai, fe (T) ex P \-ti 0 \ Lk (s)ds 
joint density for these independent processes is their product, f(T) = YlkfL k (Tk)- 

The probability at time T n _ x of any next state, J n and time T n is a question of which of 
the L k fires first and when it fires. The time at which a particular process would fire, were 
it first to fire, corresponds to integrating the joint density over all ways in which every other 
process can fire later, 

roo roc 

qij{T) — / ■■■/ n kfL k (sk)ds k , (A.3) 

JT n JT 

where the integrals are over all but one of the stochastic processes. This quantity is called 
the semi-Markov density and, in terms of hazard rates, integrates to 


Qij ( T ) = >^L k (T)ex p 


L k (s)ds 


(A.4) 


The semi-Markov density, qij(T), is a joint discrete and continuous probability density. It 
will be incomplete if any of the stochastic variables L k are incomplete. This joint probability 
distribution can be split two ways into a marginal and conditional. Integrating the semi- 
Markov kernel over all time yields a stochastic matrix, 7ly,- = Qij(s)ds , which is the 
marginal probability of choosing a cause associated with a particular L k . If all causes which 
lead to the same final state are grouped together, then pij is the usual stochastic matrix 
used to determine the next state of the system. The conditional of p tJ is called the holding 
time, hij(T) = qij{T)/pij. This is the probability distribution for firing of a cause given that 
this cause will be the next to fire. Confusion of hij(T) and fj, k (T) will lead to trajectories 
which do not match input values. The other marginal distribution is the waiting time until 
the next event, Wij(T ) = J2j qij(T), obtained by summing over all causes which are enabled. 
The marginal for this is a time-dependent stochastic matrix, Pij(T ) = qij(T)/wij(T). 
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